A piggyback method to collect statistics for query optimization in database management systems
نویسندگان
چکیده
A database management system (DBMS) usually performs query optimization based on statistical information about data in the underlying database. Out-of-date statistics may lead to ineecient query processing in the system. Existing solutions to this problem have some drawbacks such as heavy administrative burden, high system load, and tardy updates. To overcome these drawbacks, our new approach, called the piggyback method, is proposed in this paper. The key idea is to piggyback some additional retrievals during the processing of a user query in order to collect more up-to-date statistics. The collected statistics are used to optimize the processing of subsequent queries. To specify the piggybacked queries, basic piggybacking operators are deened in this paper. Using the operators, several types of piggybacking such as vertical, horizontal, mixed vertical and horizontal, and multi-query piggybacking are introduced. Statistics that can be obtained from diierent access methods by applying piggyback analysis during query processing are also studied. In order to meet users' diierent requirements for the associated overhead, several piggybacking levels are suggested. Other related issues including initial statistics, piggybacking time, and parallelism are discussed. Our analysis shows that the piggyback method is promising in improving the quality of query optimization in a DBMS as well as in reducing the user's administrative burden for maintaining an eecient DBMS.
منابع مشابه
Piggyback Statistics Collection for Query Optimization: Towards a Self-Maintaining Database Management System
A database management system (DBMS) performs query optimization based on statistical information about data in the underlying database. Out-of-date statistics may lead to inefficient query processing in the system. The existing utility method, which collects statistics in batch mode, suffers from drawbacks such as heavy administrative burden, high system load and tardy updates. In this paper, w...
متن کاملAn integrated method for estimating selectivities in a multidatabase system
A multidatabase system (MDBS) integrates information from autonomous local databases managed by diierent database management systems (MDBS) in a distributed environment. A number of challenges are raised for query optimization in such an MDBS. One of the major challenges is that some local optimization information may not be available at the global level. We recently proposed a query sampling m...
متن کاملDetermining Essential Statistics for Cost Based Optimization of an ETL Workflow
Many of the ETL products in the market today provide tools for design of ETL workflows, with very little or no support for optimization of such workflows. Optimization of ETL workflows pose several new challenges compared to traditional query optimization in database systems. There have been many attempts both in the industry and the research community to support cost-based optimization techniq...
متن کاملAdaptive Query Processing : Dealing with Incomplete and Uncertain Statistics
The standard Database Management Systems (DBMS) query processing model picks a single nonadaptive plan and executes it to completion. The chosen plan aims to minimize running time by carefully optimizing the use of secondary storage, memory, and CPU. DBMS optimizers estimate plan costs by using statistics–information describing the datasets, the queries, and the system. When statistics needed t...
متن کاملQuery Optimization in Dynamic Environments
Most modern applications deal with very large amounts of data. Having to deal with such huge amounts of data is in itself a challenge. This challenge is complicated even more by the fact that, in many cases, this data is constantly changing and evolving. For instance, relational databases that handle the data of day-to-day transactional applications often have tables with very high data change ...
متن کامل